Abstract
We investigated whether the “unity assumption,” according to which an observer assumes that two different sensory signals refer to the same underlying multisensory event, influences the multisensory integration of audiovisual speech stimuli. Syllables (Experiments 1, 3, and 4) or words (Experiment 2) were presented to participants at a range of different stimulus onset asynchronies using the method of constant stimuli. Participants made unspeeded temporal order judgments regarding which stream (either auditory or visual) had been presented first. The auditory and visual speech stimuli in Experiments 1–3 were either gender matched (i.e., a female face presented together with a female voice) or else gender mismatched (i.e., a female face presented together with a male voice). In Experiment 4, different utterances from the same female speaker were used to generate the matched and mismatched speech video clips. Measuring in terms of the just noticeable difference the participants in all four experiments found it easier to judge which sensory modality had been presented first when evaluating mismatched stimuli than when evaluating the matched-speech stimuli. These results therefore provide the first empirical support for the “unity assumption” in the domain of the multisensory temporal integration of audiovisual speech stimuli.
Article PDF
Similar content being viewed by others
References
Abry, C., Cathiard, M. A., Robert-Ribes, J., &Schwartz, J. L. (1994). The coherence of speech in audio-visual integration.Cahiers de Psychologie Cognitive/Current Psychology of Cognition,13, 52–59.
Alais, D., &Burr, D. (2004). The ventriloquist effect results from nearoptimal bimodal integration.Current Biology,14, 257–262.
Alsius, A., Navarra, J., Campbell, R., &Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands.Current Biology,15, 1–5.
Armel, K. C., &Ramachandran, V. S. (2003). Projecting sensations to external objects: Evidence from skin conductance response.Proceedings of the Royal Society of London: Series B,270, 1499–1506.
Arnold, D. H., Johnston, A., &Nishida, S. (2005). Timing sight and sound.Vision Research,45, 1275–1284.
Battaglia, P. W., Jacobs, R. A., &Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization.Journal of the Optical Society of America A,20, 1391–1397.
Bedford, F. L. (1994). A pair of paradoxes and the perceptual pairing processes.Cahiers de Psychologie Cognitive/Current Psychology of Cognition,13, 60–68.
Bedford, F. L. (2001). Towards a general law of numerical/object identity.Cahiers de Psychologie Cognitive/Current Psychology of Cognition,20, 113–175.
Bermant, R. I., &Welch, R. B. (1976). Effect of degree of separation of visual-auditory stimulus and eye position upon spatial interaction of vision and audition.Perceptual & Motor Skills,42, 487–493.
Bertelson, P., &Aschersleben, G. (1998). Automatic visual bias of perceived auditory location.Psychonomic Bulletin & Review,5, 482–489.
Bertelson, P., &Aschersleben, G. (2003). Temporal ventriloquism: Crossmodal interaction on the time dimension. 1. Evidence from auditory-visual temporal order judgment.International Journal of Psychophysiology,50, 147–155.
Bertelson, P., &de Gelder, B. (2004). The psychology of multimodal perception. In C. Spence & J. Driver (Eds.),Crossmodal space and crossmodal attention (pp. 141–177). Oxford: Oxford University Press.
Caclin, A., Soto-Faraco, S., Kingstone, A., &Spence, C. (2002). Tactile “capture” of audition.Perception & Psychophysics,64, 616–630.
Calvert, G. A., Spence, C., &Stein, B. E. (Eds.). (2004).The handbook of multisensory processing. Cambridge, MA: MIT Press.
Corballis, M. C. (1994). Do you need module any more?Cahiers de Psychologie Cognitive/Current Psychology of Cognition,13, 81–83.
Coren, S., Ward, L. M., &Enns, J. T. (2004).Sensation and perception (6th ed.). Fort Worth, TX: Harcourt Brace.
Easton, R. D., &Basala, M. (1982). Perceptual dominance during lipreading.Perception & Psychophysics,32, 562–570.
Epstein, W. (1975). Recalibration by pairing: A process of perceptual learning.Perception,4, 59–72.
Ernst, M. O., &Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion.Nature,415, 429–433.
Fendrich, R., &Corballis, P. M. (2001). The temporal cross-capture of audition and vision.Perception & Psychophysics,63, 719–725.
Finney, D. J. (1964).Probit analysis: Statistical treatment of the sigmoid response curve. Cambridge: Cambridge University Press.
Fisher, B. D., &Pylyshyn, Z. W. (1994). The cognitive architecture of bimodal event perception: A commentary and addendum to Radeau (1994).Cahiers de Psychologie Cognitive/Current Psychology of Cognition,13, 92–96.
Green, K. P., &Gerdeman, A. (1995). Cross-modal discrepancies in coarticulation and the integration of speech information: The McGurk effect with mismatched vowels.Journal of Experimental Psychology: Human Perception & Performance,21, 1409–1426.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., &Stevens, E. B. (1991). Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect.Perception & Psychophysics,50, 524–536.
Guski, R., &Troje, N. F. (2003). Audiovisual phenomenal causality.Perception & Psychophysics,65, 789–800.
Hairston, W. D., Wallace, M. T., Vaughan, J. W., Stein, B. E., Norris, J. L., &Schirillo, J. A. (2003). Visual localization ability influences cross-modal bias.Journal of Cognitive Neuroscience,15, 20–29.
Heron, J., Whitaker, D., &McGraw, P. V. (2004). Sensory uncertainty governs the extent of audio-visual interaction.Vision Research,44, 2875–2884.
Jack, C. E., &Thurlow, W. R. (1973). Effects of degree of visual association and angle of displacement on the “ventriloquism” effect.Perceptual & Motor Skills,37, 967–979.
Jackson, C. V. (1953). Visual factors in auditory localization.Quarterly Journal of Experimental Psychology,5, 52–65.
Jones, J. A., &Jarick, M. (2006). Multisensory integration of speech signals: The relationship between space and time.Experimental Brain Research,174, 588–594.
Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette, J. H., &Wallace, M. T. (2004). Semantic congruence is a critical factor in multisensory behavioral performance.Experimental Brain Research,158, 405–414.
Lavie, N. (2005). Distracted and confused?: Selective attention under load.Trends in Cognitive Sciences,9, 75–82.
Lyons, G., Sanabria, D., Vatakis, A., &Spence, C. (2006). The modulation of crossmodal integration by unimodal perceptual grouping: A visuo-tactile apparent motion study.Experimental Brain Research,174, 510–516.
MacDonald, J., &McGurk, H. (1978). Visual influences on speech perception processes.Perception & Psychophysics,24, 253–257.
McGrath, M., &Summerfield, Q. (1985). Intermodal timing relations and audiovisual speech recognition by normal hearing adults.Journal of the Acoustical Society of America,77, 678–685.
McGurk, H., &MacDonald, J. (1976). Hearing lips and seeing voices.Nature,264, 746–748.
Michotte, A. (1946).La Perception de la Causalité [The perception of causality]. Louvain: Institut Supérieur de Philosophie.
Miller, E. A. (1972). Interactions of vision and touch in conflict and nonconflict form perception tasks.Journal of Experimental Psychology,96, 114–123.
Morein-Zamir, S., Soto-Faraco, S., &Kingstone, A. (2003). Auditory capture of vision: Examining temporal ventriloquism.Cognitive Brain Research,17, 154–163.
Munhall, K. G., &Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audiovisual speech perception. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.),The handbook of multisensory processing (pp. 177–188). Cambridge, MA: MIT Press.
Radeau, M. (1994). Auditory-visual spatial interaction and modularity.Cahiers de Psychologie Cognitive/Current Psychology of Cognition,13, 3–51.
Radeau, M., &Bertelson, P. (1977). Adaptation to auditory-visual discordance and ventriloquism in semirealistic situations.Perception & Psychophysics,22, 137–146.
Radeau, M., &Bertelson, P. (1987). Auditory-visual interaction and the timing of inputs. Thomas (1941) revisited.Psychological Research,49, 17–22.
Roach, N. W., Heron, J., &McGraw, P. V. (2006). Resolving multisensory conflict: A strategy for balancing the costs and benefits of audio-visual integration.Proceedings of the Royal Society of London: Series B,273, 2159–2168.
Rosenblum, L. D., &Saldaña, H. M. (1992). Discrimination tests of visually influenced syllables.Perception & Psychophysics,52, 461–473.
Sanabria, D., Soto-Faraco, S., Chan, J. S., &Spence, C. (2004). When does visual perceptual grouping affect multisensory integration?Cognitive, Affective, & Behavioral Neuroscience,4, 218–229.
Scheier, C. R., Nijhawan, R., &Shimojo, S. (1999). Sound alters visual temporal resolution.Investigative Ophthalmology & Visual Science,40, S792.
Shaw, M. L. (1980). Identifying attentional and decision-making components in information processing. In R. S. Nickerson (Ed.),Attention and performance VIII (pp. 277–296). Hillsdale, NJ: Erlbaum.
Shore, D. I., Spence, C., &Klein, R. M. (2001). Visual prior entry.Psychological Science,12, 205–212.
Slutsky, D. A., &Recanzone, G. H. (2001). Temporal and spatial dependency of the ventriloquism effect.NeuroReport,12, 7–10.
Soto-Faraco, S., Kingstone, A., &Spence, C. (2003). Multisensory contributions to the perception of motion.Neuropsychologia,41, 1847–1862.
Spence, C., Sanabria, D., &Soto-Faraco, S. (2007). Intersensory Gestalten: Assessing the influence of intramodal perceptual grouping on crossmodal interactions. In K. Noguchi (Ed.),The psychology of beauty and Kansei: New horizons of Gestalt perception (pp. 519–579). Tokyo: Fuzanbo International.
Spence, C., Shore, D. I., &Klein, R. M. (2001). Multisensory prior entry.Journal of Experimental Psychology: General,130, 799–832.
Stein, B. E., &Meredith, M. A. (1993).The merging of the senses. Cambridge, MA: MIT Press.
Stone, J. V., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., et al. (2001). When is now? Perception of simultaneity.Proceedings of the Royal Society of London: Series B,268, 31–38.
Sugita, Y., &Suzuki, Y. (2003). Implicit estimation of sound-arrival time.Nature,421, 911.
Thomas, G. J. (1941). Experimental study of the influence of vision on sound localization.Journal of Experimental Psychology,28, 163–177.
Thurlow, W. R., &Jack, C. E. (1973). Certain determinants of the “ventriloquism effect.”Perceptual & Motor Skills,36, 1171–1184.
van Wassenhove, V., Grant, K. W., &Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech.Proceedings of the National Academy of Sciences,102, 1181–1186.
Vatakis, A., &Spence, C. (2006). Audiovisual synchrony perception for speech and music using a temporal order judgment task.Neuroscience Letters,393, 40–44.
Vroomen, J. (1999). Ventriloquism and the nature of the unity assumption. In G. Aschersleben, T. Bachmann, & J. Müsseler (Eds.),Cognitive contributions to the perception of spatial and temporal events (pp. 389–393). Amsterdam: Elsevier.
Vroomen, J., &de Gelder, B. (2000). Sound enhances visual perception: Cross-modal effects of auditory organization on vision.Journal of Experimental Psychology: Human Perception & Performance,26, 1583–1590.
Vroomen, J., &Keetels, M. (2006). The spatial constraint in intersensory pairing: No role in temporal ventriloquism.Journal of Experimental Psychology: Human Perception & Performance,32, 1063–1071.
Walker, S., Bruce, V., &O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect.Perception & Psychophysics,57, 1124–1133.
Warren, D. H., Welch, R. B., &McCarthy, T. J. (1981). The role of visual-auditory “compellingness” in the ventriloquism effect: Implications for transitivity among the spatial senses.Perception & Psychophysics,30, 557–564.
Weerts, T. C., &Thurlow, W. R. (1971). The effects of eye position and expectation on sound localization.Perception & Psychophysics,9, 35–39.
Welch, R. B. (1972). The effect of experienced limb identity upon adaptation to simulated displacement of the visual field.Perception & Psychophysics,12, 453–456.
Welch, R. B. (1999). Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler (Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 371–387). Amsterdam: Elsevier.
Welch, R. B., &Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy.Psychological Bulletin,88, 638–667.
Witkin, H. A., Wapner, S., &Leventhal, T. (1952). Sound localization with conflicting visual and auditory cues.Journal of Experimental Psychology,43, 58–67.
Author information
Authors and Affiliations
Corresponding author
Additional information
A.V. was supported by a Newton Abraham Studentship from the Medical Sciences Division, University of Oxford.
Rights and permissions
About this article
Cite this article
Vatakis, A., Spence, C. Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli. Perception & Psychophysics 69, 744–756 (2007). https://doi.org/10.3758/BF03193776
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03193776